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A system to Remove Duplicated Information in Hypertext 



Overview: 

A majority of information currently available in HTML and other mark-up languages has been designed for display 
on a Desktop Computer Monitor of a typical resolution of 640 by 480 or 1024 by 768 pixels. A typical small screen 
device only has a resolution of 120 by 90. This system has been designed to re-process the original document into 
a format that will be easier to interpret and understand on a small screen device. 

This system has been designed for the purposes of converting information published in a hypertext mark-up 
language, to a format more suitable for small screen device. In a typical installation, the hypertext language would 
be HTML and the destination device would be PDA (Personal Digital Assistant) or Mobile phone. 

The system can be used on any mark-up language and work both locally as well as across a network. 



Designers of computerised hypertext often repeat information on many pages of text. Replication of this kind of 
information can lead to extended technical delays such as the downloading time, and longer reading times by 
readers. This is especially true on small screen devices such as mobile telephones and Personal Digital Assistants 
(PDA's). An example is a navigational toolbar on every page of a site on the World Wide Web. In cases where the 
hypertext designer preferred large and powerful computers, it may be almost impossible to access it on small 
devices, even if such portability was originally intended. 



Hypertext documents are viewed in some sequence by each reader, moving from one to another by choosing 
'links' within each page. Where some information is presented on an early page and then ignored by the reader, it 
might be reasonable to assume that they are not interested in it. Also, many modern hypertext document systems 
(sometimes called 'web sites') are designed in a hierarchical form. There may be pages to list the sections of the 
web site, and more to list each sub-section, followed by pages containing actual content. Either such a hierarchy 
or the historical tracking of a user's reading could be employed to assist Argo's invention in guessing which pages 
a reader should already have read, if historical tracking information has not been recorded for them. 

Argo proposes a system of computer software, through which users are required to fetch hypertext documents 
that they wish to read. Typically this is in the form of an intermediate *proxy server* but a stand-alone mode of 
operation can also be envisaged. The system processes the hypertext pages as they are transferred from the 
storage location to the reader, removing parts, recording what it has found, and performing other tasks. 

Once a hypertext document has been requested by the user and subsequently received by the system, Argo's 
system examines the hierarchy in which the page exists on the basis of the document's Uniform Resource 
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Solution: 
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Identifier (URI). Thi.^ URI-fOc^some similar Information appropriate to the hypertext system being used, should 
uniquely identify the page and may provide some information about the hierarchy in \A4iich it exists. Argo's 
Invention fetches each page that is above th requested one in the hierarchy (sometimes called 'parent' pages), 
and makes a note of discrete units of Information on each page. It may only note links to other pages, but divisions 
of other Information such as images or footnotes can also be envisaged. If the reader's activity is being recorded, 
then pages they have already viewed may be considered instead of parent pages of the current document. 

Once a note has been made of the information units on each page, those units that are present on parent pages 
are removed from the one requested by the reader. One or more new links are added to the current page to 
ensure that the reader has the opportunity to return to pages which do contain the links, should they wish to use 
them. 

The advantage of this a procedure is that each document will be reduced to a more manageable size without 
removing significant information from it, and without requiring special preparation by the hypertext author. This Is 
important for small devices that are technically limited and very different from the majority of readers for whom 
such authors write. 

If the system is configured to work with a historical record of pages viewed by the reader, the oldest page 
considered as part of the link removal may either be the first page seen, the first seen within a certain time like ten 
minutes, or the A/'f/? last page, perhaps the tenth last. It would not consider any page viewed after the first viewing 
of the current page (nor of course would it treat the current page as a previous one). This ensures that if the user 
goes 'Back' to a previous page, they will not lose all of the links on it. 

The first diagram shows the structure of an Imaginary web site. If a user requests page 'G', Argo's system will 
compare it with documents 'C and 'A* before delivering the abbreviated version of 'G'. If Argo's system is 
operating on the basis of the reader's previous actions rather than the web site hierarchy, then all (or a certain 
number of the most recent) pages previously viewed will be considered Instead. 
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The second diagram shows the same pages, as they would be processed by Argo's system. The top row shows 
the original hypertext pages, and the bottom row shows how those pages might appear after processing. 
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Original document 'A* 



Nows 
Products 

Welcome to Advanced Nuclear 
Backpack Systems 

Spea'al offer Qftfia day 
Want a quQia? 



ProcBsse6 document 'A' 



Home 
News 
Products 

Welcome to Advanced Nuclear 
Backpack Systems 



Special offer of the day 

Want a quq\q? 
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News 
Products 

Laser Systems 

LY22 

DY33 
Particle Accelerators 

DZY93A 
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Processed document *C' 



Home ftp document 'A'^ 



Laser Systems 
LY23 

Particle Accelerators 
DZY93A 
DZY93F 



Original document 'G' 



Home 
News 
Products 

DZY93F particle accelerator 
The 02Y93F partlcal accelerator 
Is designed for use in today's 
heavy industrial chemistry 
laboratories 

About Partical AcealgiTjhjrS 
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Processed document 'G* 



Home ftp document -AM 

D2:Y93F particle accelerator 
The DZY93F partfcal accelerator 
is designed for use in today's 
heavy industrial chemistry 
laboratories 

About Partical Accelarators 



As the second diagram shows, duplicated links that appear on lower level pages are removed and replaced with 
links to parent pages. Links on pages high in the hierarchy (or viewed earlier) are kept regardless of how they 
appear on lower (or later) pages. 
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